Release 10.1A: OpenEdge Development:
Programming Interfaces
Using word-break tables
You can create word-break tables that specify word separators using a rich set of criteria. To specify and work with word-break tables involves:
Specifying word delimiter attributes
As mentioned previously, to break down the contents of a word-indexed field into individual words, Progress needs to know which characters delimit words and which do not. The distinction can be subtle and sometimes depends on context. For example, consider the function of the dot in the character strings in Table 1–5.
In the first character string, the dot functions as a decimal point and does not divide one word from another. Thus, you can query on the word “$25,125.95.” In the second character string, by contrast, the dot functions as a period, dividing the word “received” from the word “call.”
To help define word delimiters systematically while allowing for contextual variation, Progress provides eight word delimiter attributes, which you can use in word-break tables. The eight word delimiter attributes appear in Table 1–6.
Understanding the syntax of word-break tables
Word delimiter attributes form the heart of word break tables, and you specify them using the following syntax:
symbolic-nameThe name of a symbol.
For example:
DOLLAR-SIGNsymbol-valueThe value of the symbol.
For example: ’$’
Note: Although some versions of Progress let you compile word-break tables that omit all items within the second pair of square brackets, Progress Software Corporation (PSC) recommends that you always include these items. If the source-code version of a compiled word-break table lacks these items, and the associated database is not so large as to make this unfeasible, PSC recommends that you add these items to the table, recompile the table, reassociate the table with the database, and rebuild the indexes.codepage-nameThe name, not surrounded by quotes, of the code page the word-break table is associated with. The maximum length is 20 characters.
For example: UTF–8
wordrules-nameThe name, not surrounded by quotes, of the compiled word-break table. The maximum length is 20 characters.
For example: utf8sample
table-typeThe number 2.
Note: Some versions of Progress allow a table type of 1. Although this is still supported, Progress Software Corporation (PSC) recommends, if feasible, that you change the table type to 2, recompile the word-break table, reassociate it with the database, and rebuild the indexes.char-literalA character within single quotes or a
symbolic–name,which represents a character in the code page.For example: ’#’
hex-literalA hexadecimal value or a
symbolic–name,which represents a character in the code page.For example:0xAC
decimal-literalA decimal value or a
symbolic–name, which represents a character in the code page.For example: 39
word-delimiter-attributeIn what context the character is a word delimiter. You can use one of the following:
Examples of word-break tables
The following is an example of a word-break table for Unicode:
As the preceding example illustrates, word-break tables can contain comments delimited as follows:
For more examples, see the word-break tables that Progress provides in source-code form. They reside in the
Note: Progress supplies a word-break table for each code page it supports.DLC/prolang/convmapdirectory and have the file extension.wbt.Compiling word-break tables
After you create or modify a word-break table, you must compile it with the
PROUTILutility. The syntax is as follows:
src-fileThe name of the word-break table file to be compiled.
rule-numA number between 1 and 255 inclusive that identifies this word-break table within your OpenEdge installation.
The
PROUTILutility names the compiled version of the word-break tableproword.rule–num. For example, ifrule–numis 34,PROUTILnames the compiled versionproword.34.Associating compiled word-break tables with databases
After you compile a word-break table, you must associate the compiled version with a database using the
PROUTILutility. The syntax is as follows:
databaseThe name of the database.
rule-numThe value of
rule–numyou specified when you compiled the word-break table.To associate the database with the default word-break rules, set
Note: Settingrule–numto zero.rule–numto zero associates the database with the default word-break rules for the current code page. For more information on code pages, see the OpenEdge Development: Internationalizing Applications .Word-break tables for double-byte and UTF-8 databases
OpenEdge ships compiled word-break tables for double-byte and UTF-8 databases. The word-break tables are pre-applied to the appropriate empty databases. This means that word-indexing is available without any additional work for those databases.
The compiled word-break tables are proword.245 through proword.254. The tables are in your
install-dir/prolang/languagedirectory or subdirectory with the corresponding empty database.If you already have a proword file that uses the same number as one of these new tables, you should change its number to a number less than 240. You can then apply the new word rule to your existing database. This change does not require an index build.
Rebuilding word indexes
For word indexing to work as expected, the word-break table Progress uses to write the word indexes (to add, modify, or delete a record that contains a word index) and the word-break table Progress uses to read word indexes (to process a query that contains the
CONTAINSoperator) must be identical. To ensure this, when you associate the compiled version of a word-break table with a database, Progress writes cyclical redundancy check (CRC) values from the compiled word-break table into the database. When you connect to the database, Progress compares the CRC values in the database to the CRC value in the compiled version of the word-break table. If they do not match, Progress displays an error message and terminates the connection attempt.If a connection attempt fails and you want to avoid rebuilding the indexes, you can try associating the database with the default word-break rules.
Note: This might invalidate the word indexes and require you to rebuild them anyway.To rebuild the indexes, you can use the
PROUTILutility with theIDXBUILDorIDXFIXqualifier.The syntax of
PROUTILwith theIDXBUILDqualifier is:
Operating system
Syntax UNIX
Windows proutildb-name-C idxbuild [ all ]
[ -Tdir-name] [ -TBblocksize]
[ -TMn] [ -Bn]
The syntax of
PROUTILwith theIDXFIXqualifier is:
For more information on the
PROUTILutility, see OpenEdge Data Management: Database Administration .Providing access to the compiled word-break table
To allow database servers and shared-memory clients to access the compiled version of the word-break table, it must reside either in the OpenEdge installation directory or in the location pointed to by the environment variable
Note: Although the name of the compiled version of the word-break table has a dot, the name of the corresponding environment variable does not.PROWDrule–num. For example, if the compiled word-break table has the nameproword.34and resides in theDLC/mydir/mysubdirdirectory, set the environment variablePROWD34toDLC/mydir/mysubdir/proword.34.
|
Copyright © 2005 Progress Software Corporation www.progress.com Voice: (781) 280-4000 Fax: (781) 280-4095 |